Introduction
Since the start of the Industrial Revolution, atmospheric carbon dioxide levels have spiked upward due to practices like deforestation and the burning of fossil fuels, causing an ever increasing amount of carbon dioxide to dissolve into the ocean. Once dissolved, carbon dioxide undergoes a series of chemical equilibrium reactions that ultimately lowers the pH of the water in a process known as ocean acidification. This process has numerous ramifications on marine ecosystems, including but not limited to the impairment of specific pH dependent biological processes and the leaching of calcium carbonate required to build the shells and skeletons of many marine species.
This project is sponsored by California Cooperative Oceanic Fisheries Investigations (CalCOFI), an organization founded in 1949 to study the ecological aspects of the Pacific sardine collapse off of the coast of California. CalCOFI is committed to studying California’s coastal marine environment and collecting relevant oceanographic data in order to provide insight on important climate change related topics such as renewable energy, integrated ocean management, and marine spatial planning.
Problems of Interest
Our goal is to study the impact of ocean acidification on zooplankton and krill biovolumes off of the California coast. First, we want to conduct a cross-comparison of bottle data and biological data to assess the amount of spatial and temporal overlap present. Using co-located measurements, we wish to model the effects of ocean acidification, using carbonate chemistry and oceanographic variables, on zooplankton and krill abundance. Specifically, we wish to identify the key environmental drivers contributing to plankton variability in a dynamic ocean environment. In addition, we are interested in exploring how pH and related environmental factors affect the abundance of calcifying versus non-calcifying species.
Data
Bottle Data
CalCOFI has collected environmental and hydrographic data for over 70 years during their quarterly cruises to their CalCOFI stations. We used the merged data from Part I which contains oceanographic and carbonate chemistry data as well as computed CO2SYS values for variables such as \(pH\), \(CO_3\), and \(\Omega_{aragonite}\).
There are three datasets that we are focusing on for our biological data: (1) CalCOFI NOAA Zooplankton Volume, (2) BTEDB (Krill) Abundances, and (3) PRPOOS Data (for Zooplankton Calcifiers/Non-Calcifiers Abundance). The zooplankton and krill biovolume data are obtained using net tows (Bongo and/or Pairovet) at each standard CalCOFI station. The PRPOOS data is also obtained using a net tow but rather than sampling at station, it is conducted during transits between stations. These three datasets have each been merged with the bottle data to create zoop_data/zooplankton_pH.csv, krill_data/CV_merged_krill.csv, and PRPOOS/prpoos_summary.csv, respectively. Refer to Methods section for the merging process.
CalCOFI NOAA Zooplankton Volume:
The zooplankton biovolume data measures the amount of “plankton” (the small and microscopic organisms floating in the sea, consisting chiefly of diatoms, protozoans, small crustaceans, and the eggs and larval stages of larger animals) in the volume of sea water sampled. In particular, we are interested in the variables total_plankton and small_plankton.
BTEDB (Krill) Abundances:
The krill dataset provides information on krill abundance from the Brinton and Townsend Euphausiid Database (BTEDB). The samples collected include species such as Euphausia pacifica, Nematoscelis difficilis, and Thysanoessa spinifera, with individuals categorized by size and developmental phase (e.g., calyptopis, furcilia, juvenile, adult).
PRPOOS (Calcifiers/Non-Calcifiers):
The PRPOOS (Planktonic Rate Processes in Oligotrophic Ocean Systems) dataset contains abundance and estimated biomass values for various zooplankton taxa, which can be categorized into calcifying and non-calcifying groups. The calcifying taxa are defined as byrozoan larvae, pteropoda heteropoda, ostracods, and rhizaria; the remaining taxa are considered non-calcifying.
Key Definitions
| Total Alkalinity (TA) |
A measure of how much alkaline (basic) material is in the water, or essentially the buffering capacity or ability to resist acidification. |
| Total Dissolved Inorganic Carbon (DIC) |
The sum of all dissolved inorganic carbon species in the water (carbon dioxide CO[2], bicarbonate ions [HCO[3]^–], and carbonate ions [CO[3]^2–]). |
| pCO[2] |
The partial pressure of carbon dioxide, a measure of CO[2] concentration. |
| CO^3 |
Carbonate ion concentration |
| Nitrate, Silicate, and Phosphate |
Key macronutrients in marine ecosystems that are vital to phytoplankton growth. |
| Ω[calcite] and Ω[aragonite] |
Carbonate saturation states, a measurement of how likely calcium carbonate (CaCO[3]) is to dissolve or form in the ocean. |
| Revelle Factor (RF) |
A measure of the buffering capacity of seawater, it is a measurement of how easily the ocean absorbs carbon dioxide from the atmosphere. |
Table 1: Description of variables used in models.
Methods
Summary of Merged Datasets
|
Bottle* |
NOAA Zooplankton
|
BTEDB Krill
|
PRPOOS Calcifiers
|
| # of Observations |
4125 |
45310 |
434 |
7482 |
70 |
1384 |
388 |
| # of Stations |
51 |
3172 |
28 |
411 |
21 |
20 |
16 |
| Start Year |
1983 |
1951 |
1987 |
1951 |
1984 |
2005 |
2009 |
| End Year |
2021 |
2023 |
2021 |
2019 |
2019 |
2025 |
2021 |
| *Note temporal gap from 2002-2007. |
Table 2: Summary of the size, spatial coverage, and temporal range of the merged datasets used in this analysis.
To merge the NOAA Zooplankton data (zoop_data/Zooplankton-new.csv) with the bottle data (merged_bottle_co2sys.csv), we performed an inner join on Station ID, month, year, and average monthly DIC to create zoop_data/zooplankton_pH.csv. The same merge process was used for the BTEDB Krill data (krill_data/BTEDB_Abundances.csv) to create krill_data/CV_merged_krill.csv).
To merge the PRPOOS data (PRPOOS/PRPOOS_all.csv) with the bottle data (merged_bottle_co2sys.csv), we performed a left join on Station ID, month, and year. Then, we collapsed the overlapping observations by calculating summary statistics for pH, \(\Omega_{aragonite}\), \(\Omega_{calcium}\), and other relevant carbonate chemistry variables to create PRPOOS/prpoos_summary.csv.
Generalized Additive Models (GAMs)
To model the response variables total_plankton and small_plankton in the NOAA dataset, we used GAMs with smoothing terms and spatial splines. We applied a log(x+1) transformation to both response variables. The following variables were used as predictors: spatial coordinates (longitude and latitude), oceanographic variables (salinity and temperature), temporal variables (month and year), carbonate chemistry variables (TA, DIC, pH, pCO2, RF, CO3, OmegaCA, and OmegaAR), and key macronutrients (nitrate, silicate, and phosphate).
To further examine spatial and chemical interactions, we constructed separate GAMs for nearshore and offshore regions using tensor product smooths between pH and both CO₃²⁻ and OmegaAR. This allowed us to assess how carbonate chemistry interacts with pH differently across regions to influence plankton abundance.
PCA and k-Means Clustering
Additionally, we performed a Principal Component Analysis (PCA) on species-level abundance data followed by k-means clustering to explore patterns in community composition. Clusters identified from the PCA were visualized in reduced dimensional space to reveal distinct plankton assemblages.
Interpolating Depth of Saturation Horizon and pH
To visualize the effects of pH on plankton abundance, we first had to find the depth of the saturation horizon for aragonite, which is the boundary between supersaturated conditions above and undersaturated conditions below. We are specifically looking at aragonite because it is a mineral form of calcium carbonate. To compute the saturation depth, we interpolate across depths to find where \(\Omega_{aragonite}\) = 1. Then, to compute the pH value we interpolate again to find the pH at that threshold depth. We use the dataset merged_bottle_co2sys.csv. Yearly data was used to estimate the saturation horizon because it provides more stable estimates and then seasonal data was used to estimate the pH.
Results
NOAA Zooplankton
GAM for Total Plankton
The model has an adjusted \(R^2\) of 0.32 with an AIC of 1095. There are eight significant variables at the \(\alpha=0.05\) level: longitude and latitude, TA, DIC, salinity, temperature, month, CO3, silicate, and phosphate. The partial effect plots of these smooth terms are shown in Figures 1-2.
The strongest signals, TA, DIC, and temperature, show nonlinear relationships with zooplankton abundance and CO3 has a slight positive linear effect. Most of the other variables show weak or flat relationships.
Figure 2 shows that there is higher predicted abundance near the coast, specifically in the Santa Barbara Channel, and lower predicted abundance offshore.
GAM for Small Plankton
The model has an adjusted \(R^2\) of 0.4 with an AIC of 963. There are seven significant variables at the \(\alpha=0.05\) level: longitude and latitude, pH, temperature, month, the revelle factor, silicate, and phosphate. The partial effect plots of these smooth terms are shown in Figures 3-4.
Silicate shows the most consistent pattern with a slight negative effect, however, most variables have flat or weak effects on their own.
Figure 4 shows coastal and northern regions show the highest predicted abundance whereas southern offshore regions show the lowest predicted abundance. This pattern is similar to Figure 2 although Figure 4 displays a smoother gradient from more to less abundant from shore to offshore, respectively, whereas Figure 2 displays more spatial structure. Figure 2 has slightly higher predicted values in some coastal areas, possibly due to the influence of large plankton taxa that cluster in coastal regions.
GAM models for Nearshore and Offshore data
Construct separate GAM models for Nearshore and Offshore data to assess interaction effects between pH, CO3, and OmegaAR for different regions
Unique quarters in data: 2 3 4 1
Family: gaussian
Link function: identity
Formula:
log_total_plankton ~ s(pHout, CO3out, bs = "tp") + s(pHout, OmegaARout,
bs = "tp")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.38846 0.05413 81.07 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(pHout,CO3out) 2.433 2.812 9.306 1.52e-05 ***
s(pHout,OmegaARout) 1.000 1.000 2.403 0.122
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.132 Deviance explained = 14.2%
GCV = 0.89813 Scale est. = 0.88495 n = 302
In nearshore regions, both CO₃²⁻ and OmegaAR exhibit sharp increases around pH 8.0, suggesting a threshold effect. In contrast, offshore regions display more gradual, continuous increases across the pH range, indicating smoother responses to changes in acidity. This highlights distinct regional patterns in how carbonate chemistry interacts with pH to influence plankton abundance.
PRPOOS Calcifying/Non-Calcifying Zooplankton
Effects of pH at Saturation Depth on Abundance by Season
Calcifier/Non-Calcifier Abundance vs. pH
Figures 5-6 suggest the calcifying zooplankton exhibit stronger seasonal sensitivity to pH at saturation depth than non-calcifiers, but not in the way one would expect. In the summer and fall, calcifier abundance shows a clear, negative relationship with increasing pH. Non-calcifiers, by contrast, show relatively stable or weakly negative trends across all seasons except winter where there is a slight positive trend in both calcifier and non-calcifier abundance.
Individual Calcifying Taxa vs. pH
Across all four calcifying taxa, there is a strong seasonal influence in how abundance responds to pH at the depth of the aragonite saturation horizon. Bryozoan larvae, pteropods/heterpods, and rhizaria follow similar patterns across seasons, with negative relationships between pH and abundance in fall and summer, indicating that these taxa may be most sensitive to higher pH during warmer months, and a positive relationship in winter. Ostracoda, on the other hand, appear relatively insensitive to pH fluctuations, except in spring where abundance has a positive relationship with pH.
Overall, these findings indicate that calcifying zooplankton responses to ocean acidifcation are not uniform and may be seasonally mediated. Also, they may be influenced by additional environmental factors, as supported by the GAMs.
Visualizing Saturation Depth Over Time by Station
The plot_saturation_horizon function visualizes the depth of the aragonite saturation horizon from 2009 to 2021 for a given Station ID. Blue points represent years with sufficient observations, while red points indicate limited data availability. Gray points at 0 m reflect missing values, where data were insufficient to interpolate a saturation depth. Shallower horizons (closer to the surface) indicate more corrosive conditions, whereas deeper depths are more favorable for calcifying organisms. Try changing the parameter to another Station ID to see how the depth of saturation changes over time!
GAMs for Individual Calcifying Taxa
We found that the spatial effect is significant in our GAM models when modeling for total abundance and species-specific abundance.
For this particular taxon, such as bryozoan larvae, abundance is greater near the coast, which is the case for most of the species we investigated. However, the spatial trend varies between species.
Family: gaussian
Link function: identity
Formula:
log1p(bryozoan_larvaeAbundance) ~ s(Longitude, Latitude, k = 15) +
s(pH_mean) + s(OmegaCA_mean) + s(CO3_mean) + s(TA_mean) +
s(DIC_mean) + s(RF_mean) + s(OmegaAR_mean)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.6666 0.1096 60.83 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Longitude,Latitude) 5.473 7.003 7.157 < 2e-16 ***
s(pH_mean) 5.758 7.071 1.590 0.16456
s(OmegaCA_mean) 8.551 8.728 0.840 0.58702
s(CO3_mean) 7.809 8.449 2.905 0.00721 **
s(TA_mean) 1.000 1.000 0.063 0.80139
s(DIC_mean) 1.123 1.225 0.023 0.92618
s(RF_mean) 1.536 1.898 0.184 0.82983
s(OmegaAR_mean) 4.164 5.238 0.658 0.68257
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.275 Deviance explained = 34.1%
GCV = 5.143 Scale est. = 4.6603 n = 388
Family: gaussian
Link function: identity
Formula:
log1p(cnidaria_ctenophoresAbundance) ~ s(Longitude, Latitude,
k = 15) + s(pH_mean) + s(OmegaCA_mean) + s(CO3_mean) + s(TA_mean) +
s(DIC_mean) + s(RF_mean) + s(OmegaAR_mean)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.86866 0.08623 79.65 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Longitude,Latitude) 4.375 5.667 1.983 0.062129 .
s(pH_mean) 7.668 8.498 3.442 0.004753 **
s(OmegaCA_mean) 1.000 1.000 0.929 0.335745
s(CO3_mean) 1.000 1.000 11.653 0.000715 ***
s(TA_mean) 6.887 7.912 1.779 0.077466 .
s(DIC_mean) 6.328 7.574 2.445 0.016046 *
s(RF_mean) 1.879 2.445 0.379 0.818575
s(OmegaAR_mean) 1.000 1.000 3.376 0.066983 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.292 Deviance explained = 34.8%
GCV = 3.1368 Scale est. = 2.8851 n = 388
Family: gaussian
Link function: identity
Formula:
log1p(copepoda_calanoida_minus_eucalanidsAbundance) ~ s(Longitude,
Latitude, k = 15) + s(pH_mean) + s(OmegaCA_mean) + s(CO3_mean) +
s(TA_mean) + s(DIC_mean) + s(RF_mean) + s(OmegaAR_mean)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.60855 0.03314 350.3 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Longitude,Latitude) 9.441 11.252 3.370 0.000157 ***
s(pH_mean) 3.214 4.174 1.367 0.276694
s(OmegaCA_mean) 1.000 1.000 3.599 0.058622 .
s(CO3_mean) 3.265 4.134 0.788 0.532249
s(TA_mean) 5.016 6.044 2.553 0.019225 *
s(DIC_mean) 1.000 1.000 2.965 0.085959 .
s(RF_mean) 3.110 3.832 1.056 0.365551
s(OmegaAR_mean) 1.000 1.000 4.661 0.031512 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.316 Deviance explained = 36.4%
GCV = 0.45922 Scale est. = 0.42603 n = 388
GAMs Grouped by Calcifying/Non-calcifying Taxa
The calcifying Species we are interested in are: bryozoan larvae, pteropoda heteropoda, ostracods and rhizaria.
Clustering Analysis with PCA
Identify clusters of species based on their abundance profiles Standardize the species abundance matrix Apply PCA to reduce dimensionality and visualize the dataset
PCA was applied to reduce dimensionality, focusing on the two principal components PC1 and PC2 that explain the highest variance: PC1 (28.6% variance): Captures the most substantial variation, primarily influenced by the highly variable species in Cluster 3. PC2 (9.9% variance): Captures less variance but provides vertical separation, assisting in visualizing overlapping clusters.
Figure 12 shows clear separation between clusters in PCA space, with Group 2 (blue) being the most distinct along Dim1, accounting for the largest proportion of variance (27.7%). Groups 1, 3, and 4 are more tightly clustered and overlap in the lower-left region, indicating more similar community structures. This suggests that Group 2 represents a functionally or environmentally different plankton assemblage.
Summary of Findings
Overall, calcifying zooplankton abundance is disproportionately affected by pH in the fall and summer as indicated by the steep negative slope, which is interesting because we would typically expect an overall positive trend as observed in the winter. Ostracods, in particular, seem less influenced by changes in pH, maintaining a stable abundance over the course of the seasons. This could be due to the fact that the pH range does not exhibit abnormal values of pH (i.e. very low or very high pH values). However, the GAMs showed that spatial patterns are consistently significant indicators of total, small, and species-specific zooplankton variation. While their spatial location varies from species to species, total plankton tend to congregate coastally, which aligns with findings from Part I where coastal stations seem to be less impacted by ocean acidification than offshore stations.
Future Work
A potential extension of this project could to be to create an interactive spatio-temporal mapping tool where users can specify the taxa, time, and depth range to generate dynamic heatmaps of variables such as abundance, saturation horizon depths, and pH levels. Another extension could be to employ additional modeling approaches or incorporate more nutrient-based variables that plankton are dependent upon (e.g. chlorophyll) to increase the proportion of variance explained in the total and species-specific plankton modeling.
Acknowledgements
A special thank you to our mentors, Erin Satterthwaite, Todd Martz, Brice Semmens, and Erika McPhillips, for their guidance and support.